DUOINSPECT- Uncovering Duplicates through Powerful Model of Random Forest and XGBoost Classifier

نویسندگان

چکیده

Duplicate question pair detection plays a vital role in improving information retrieval systems and enhancing user experience. In this paper, we present comprehensive study on duplicate utilizing the Quora dataset. We employed machine learning techniques, specifically Random Forest XGBoost classifiers, to develop accurate models for identifying pairs. To improve performance of models, introduced additional features dataset, augmenting original data. By incorporating 22 extra derived from raw data, aimed capture more nuanced patterns increase models' discriminatory power. The model achieved significant improvement, with boost 89.4% accuracy compared initial 73% accuracy. classifier also showed promising results, achieving an 73.4% initially 79.2% after features. This paper serves as valuable reference researchers practitioners interested field detection. findings highlight effectiveness classifiers combination task. web application provides practical user-friendly tool real- time detection, offering potential applications systems, chatbots, question-answering platforms. Key Words: Question Pair Similarity; XGBoost, Forest, Machine Learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thresholding a Random Forest Classifier

The original Random Forest derives the final result with respect to the number of leaf nodes voted for the corresponding class. Each leaf node is treated equally and the class with the most number of votes wins. Certain leaf nodes in the topology have better classification accuracies and others often lead to a wrong decision. Also the performance of the forest for different classes differs due ...

متن کامل

Uncovering a Novel Mechanism for Gene Expression Regulation with a Random Forest Classifier

Trypanosoma brucei is a devastating parasite which infects tens of thousands of people in sub-Saharan Africa, causing a deadly disease known as African sleeping sickness [1]. The parasite is transmitted from human to human via an insect vector known as the tsetse fly, which feeds on human blood [1]. T. brucei possesses a highly unusual genome. Unlike most organisms in its phylogenetic domain, t...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Random Forest Classifier Based ECG Arrhythmia Classification

Heart Rate Variability (HRV) analysis is a non-invasive tool for assessing the autonomic nervous system and for arrhythmia detection and classification. This paper presents a Random Forest classifier based diagnostic system for detecting cardiac arrhythmias using ECG data. The authors use features extracted from ECG signals using HRV analysis and DWT for classification. The experimental results...

متن کامل

Random Forest Classifier Based ECG Arrhythmia Classification

Heart Rate Variability (HRV) analysis is a non-invasive tool for assessing the autonomic nervous system and for arrhythmia detection and classification. This paper presents a Random Forest classifier based diagnostic system for detecting cardiac arrhythmias using ECG data. The authors use features extracted from ECG signals using HRV analysis and DWT for classification. The experimental results...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Indian Scientific Journal Of Research In Engineering And Management

سال: 2023

ISSN: ['2582-3930']

DOI: https://doi.org/10.55041/ijsrem24888